SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles

نویسندگان

  • Michael S. Porter
  • Robert G. Beiko
چکیده

BACKGROUND Homology-based taxonomic assignment is impeded by differences between the unassigned read and reference database, forcing a rank-specific classification to the closest (and possibly incorrect) reference lineage. This assignment may be correct only to a general rank (e.g. order) and incorrect below that rank (e.g. family and genus). Algorithms like LCA avoid this by varying the predicted taxonomic rank based on matches to a set of taxonomic references. LCA and related approaches can be conservative, especially if best matches are taxonomically widespread because of events such as lateral gene transfer (LGT). RESULTS Our extension to LCA called SPANNER (similarity profile annotater) uses the set of best homology matches (the LCA Profile) for a given sequence and compares this profile with a set of profiles inferred from taxonomic reference organisms. SPANNER provides an assignment that is less sensitive to LGT and other confounding phenomena. In a series of trials on real and artificial datasets, SPANNER outperformed LCA-style algorithms in terms of taxonomic precision and outperformed best BLAST at certain levels of taxonomic novelty in the dataset. We identify examples where LCA made an overly conservative prediction, but SPANNER produced a more precise and correct prediction. CONCLUSIONS By using profiles of homology matches to represent patterns of genomic similarity that arise because of vertical and lateral inheritance, SPANNER offers an effective compromise between taxonomic assignment based on best BLAST scores, and the conservative approach of LCA and similar approaches. AVAILABILITY C++ source code and binaries are freely available at http://kiwi.cs.dal.ca/Software/SPANNER. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accurate Taxonomic Assignment of Short Pyrosequencing Reads

Ambiguities in the taxonomy dependent assignment of pyrosequencing reads are usually resolved by mapping each read to the lowest common ancestor in a reference taxonomy of all those sequences that match the read. This conservative approach has the drawback of mapping a read to a possibly large clade that may also contain many sequences not matching the read. A more accurate taxonomic assignment...

متن کامل

OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units

Assignment of 16S rRNA gene sequences to operational taxonomic units (OTUs) is a computational bottleneck in the process of analyzing microbial communities. Although this has been an active area of research, it has been difficult to overcome the time and memory demands while improving the quality of the OTU assignments. Here, we developed a new OTU assignment algorithm that iteratively reassign...

متن کامل

Evaluation of Similarity Measures for Template Matching

Image matching is a critical process in various photogrammetry, computer vision and remote sensing applications such as image registration, 3D model reconstruction, change detection, image fusion, pattern recognition, autonomous navigation, and digital elevation model (DEM) generation and orientation. The primary goal of the image matching process is to establish the correspondence between two ...

متن کامل

Quantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species

Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...

متن کامل

A procedure for Web Service Selection Using WS-Policy Semantic Matching

In general, Policy-based approaches play an important role in the management of web services, for instance, in the choice of semantic web service and quality of services (QoS) in particular. The present research work illustrates a procedure for the web service selection among functionality similar web services based on WS-Policy semantic matching. In this study, the procedure of WS-Policy publi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2013